Recent research has shown remarkable performance in leveraging multiple extraneous conditional and non-mutually exclusive semantic concepts for sound source separation, allowing the flexibility to extract a given target source based on multiple different queries. In this work, we propose a new optimal condition training (OCT) method for single-channel target source separation, based on greedy parameter updates using the highest performing condition among equivalent conditions associated with a given target source. Our experiments show that the complementary information carried by the diverse semantic concepts significantly helps to disentangle and isolate sources of interest much more efficiently compared to single-conditioned models. Moreover, we propose a variation of OCT with condition refinement, in which an initial conditional vector is adapted to the given mixture and transformed to a more amenable representation for target source extraction. We showcase the effectiveness of OCT on diverse source separation experiments where it improves upon permutation invariant models with oracle assignment and obtains state-of-the-art performance in the more challenging task of text-based source separation, outperforming even dedicated text-only conditioned models.
translated by 谷歌翻译
本文探讨了培训来生成代码的大型语言模型(LLMS)可以极大地提高对基因编程(GP)应用程序的突变操作员的有效性。由于此类LLM受益于包括顺序更改和修改的训练数据,因此它们可以近似人类会做出的可能变化。为了强调通过大型模型(ELM)的这种进化的含义的广度,在主要实验ELM与MAP-ELITE结合产生了数十万个Python程序的功能示例,这些示例在Sodarace域中输出了在Sodarace域中运行AMBULE的机器人,原始LLM从未在预训练中见过。然后,这些示例有助于引导培训一种新的条件语言模型,该模型可以为特定地形输出合适的步行者。引导新模型可以在以前可用的零培训数据中为给定上下文中输出适当的工件的新模型具有对开放性,深度学习和增强学习的影响。在这里深入探讨了这些含义,以期激发榆树现在打开的新研究方向。
translated by 谷歌翻译
使用单视图2D照片仅集合,无监督的高质量多视图 - 一致的图像和3D形状一直是一个长期存在的挑战。现有的3D GAN是计算密集型的,也是没有3D-一致的近似;前者限制了所生成的图像的质量和分辨率,并且后者对多视图一致性和形状质量产生不利影响。在这项工作中,我们提高了3D GAN的计算效率和图像质量,而无需依赖这些近似。为此目的,我们介绍了一种表现力的混合明确隐式网络架构,与其他设计选择一起,不仅可以实时合成高分辨率多视图一致图像,而且还产生高质量的3D几何形状。通过解耦特征生成和神经渲染,我们的框架能够利用最先进的2D CNN生成器,例如Stylega2,并继承它们的效率和表现力。在其他实验中,我们展示了与FFHQ和AFHQ猫的最先进的3D感知合成。
translated by 谷歌翻译
综合照片 - 现实图像和视频是计算机图形的核心,并且是几十年的研究焦点。传统上,使用渲染算法(如光栅化或射线跟踪)生成场景的合成图像,其将几何形状和材料属性的表示为输入。统称,这些输入定义了实际场景和呈现的内容,并且被称为场景表示(其中场景由一个或多个对象组成)。示例场景表示是具有附带纹理的三角形网格(例如,由艺术家创建),点云(例如,来自深度传感器),体积网格(例如,来自CT扫描)或隐式曲面函数(例如,截短的符号距离)字段)。使用可分辨率渲染损耗的观察结果的这种场景表示的重建被称为逆图形或反向渲染。神经渲染密切相关,并将思想与经典计算机图形和机器学习中的思想相结合,以创建用于合成来自真实观察图像的图像的算法。神经渲染是朝向合成照片现实图像和视频内容的目标的跨越。近年来,我们通过数百个出版物显示了这一领域的巨大进展,这些出版物显示了将被动组件注入渲染管道的不同方式。这种最先进的神经渲染进步的报告侧重于将经典渲染原则与学习的3D场景表示结合的方法,通常现在被称为神经场景表示。这些方法的一个关键优势在于它们是通过设计的3D-一致,使诸如新颖的视点合成捕获场景的应用。除了处理静态场景的方法外,我们还涵盖了用于建模非刚性变形对象的神经场景表示...
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译
Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naive approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows for the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and behaviour cloning, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.
translated by 谷歌翻译
The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.
translated by 谷歌翻译
Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there are many ways to choose among optima that achieve equivalent objective values. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of gradient-based algorithms for bilevel optimization. We delineate two standard BLO methods -- cold-start and warm-start -- and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer parameters are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.
translated by 谷歌翻译
The Covid-19 pandemic induced a vast increase in adolescents diagnosed with eating disorders and hospitalized due to eating disorders. This immense growth stemmed partially from the stress of the pandemic but also from increased exposure to content that promotes eating disorders via social media, which, within the last decade, has become plagued by pro-eating disorder content. This study aimed to create a deep learning model capable of determining whether a given social media post promotes eating disorders based solely on image data. Tweets from hashtags that have been documented to promote eating disorders along with tweets from unrelated hashtags were collected. After prepossessing, these images were labeled as either pro-eating disorder or not based on which Twitter hashtag they were scraped from. Several deep-learning models were trained on the scraped dataset and were evaluated based on their accuracy, F1 score, precision, and recall. Ultimately, the vision transformer model was determined to be the most accurate, attaining an F1 score of 0.877 and an accuracy of 86.7% on the test set. The model, which was applied to unlabeled Twitter image data scraped from "#selfie", uncovered seasonal fluctuations in the relative abundance of pro-eating disorder content, which reached its peak in the summertime. These fluctuations correspond not only to the seasons, but also to stressors, such as the Covid-19 pandemic. Moreover, the Twitter image data indicated that the relative amount of pro-eating disorder content has been steadily rising over the last five years and is likely to continue increasing in the future.
translated by 谷歌翻译
We introduce a pivot for exact selective inference with randomization. Not only does our pivot lead to exact inference in Gaussian regression models, but it is also available in closed form. We reduce the problem of exact selective inference to a bivariate truncated Gaussian distribution. By doing so, we give up some power that is achieved with approximate inference in Panigrahi and Taylor (2022). Yet we always produce narrower confidence intervals than a closely related data-splitting procedure. For popular instances of Gaussian regression, this price -- in terms of power -- in exchange for exact selective inference is demonstrated in simulated experiments and in an HIV drug resistance analysis.
translated by 谷歌翻译